Understanding Parallelism in Computer Programs
Exploring how to identify and classify parallelism in programs based on grain size
This classification is based on recognizing the parallelism in a program to be executed on a multiprocessor system. The idea is to identify the sub-tasks or instructions in a program that can be executed in parallel.
For example, there are 3 statements in a program and statements S1 and S2 can be exchanged. That means, these are not sequential as shown in the figure. Then S1 and S2 can be executed in parallel.
Sequential Execution:
S1: a = b + c
S2: d = e + f
S3: g = a + d
Parallel Execution:
S1: a = b + c S2: d = e + f
↓ ↓
S3: g = a + d
But it is not sufficient to check for the parallelism between statements or processes in a program. The decision of parallelism also depends on the following factors:
Number and types of processors available
How memory is structured and accessed
Data, control, and resource dependencies
As mentioned earlier, parallel computing necessitates that the segments to be run concurrently must be autonomous of one another. Therefore, before implementing parallelism, all the prerequisites of parallelism between the segments need to be examined.
In this part, we talk about three kinds of dependency circumstances between the segments:
When two or more commands use the same information
Dependencies within control structures
When instructions utilize the same shared resource
Data dependency refers to the condition where two or more commands use the same information. The following kinds of data dependencies are identified:
If instruction I2 follows I1 and output of I1 becomes input of I2, then I2 is said to be flow dependent on I1
When instruction I2 follows I1 such that output of I2 overlaps with the input of I1 on the same data
When output of the two instructions I1 and I2 overlap on the same data
When read and write operations by two instructions are invoked on the same file
I1: a = b I2: c = a + d I3: a = c
This program segment contains instructions I1, I2, and I3 that have various dependencies:
Instructions or segments in a program often contain control structures. As a result, dependency among the statements can also occur within control structures. However, the order in which instructions in control structures will execute is not known until run time.
For (i = 1; i <= n; i++) {
if (x[i - 1] == 0)
x[i] = 0
else
x[i] = 1;
}
In the above control structure, the successive iterations are dependent on one another because the value of x[i] in iteration i depends on the value of x[i-1] from the previous iteration.
The similarity between the instructions can also be influenced because of the shared resources. If two instructions are utilizing the same shared resource then there is a resource dependency situation.
When floating point units or registers are shared
When memory is being shared
For execution of instructions or block of instructions in parallel, it should be ensured that the instructions are independent of each other. These instructions can be data dependent / control dependent / resource dependent on each other. Here we consider only data dependency among the statements for taking decisions of parallel execution.
A.J. Bernstein has elaborated the work of data dependency and derived some conditions based on which we can decide the parallelism of instructions or processes. Bernstein conditions are based on the following two sets of variables:
The input set that consists of memory locations read by the statement of instruction I1
The output set that consists of memory locations written into by instruction I1
The sets RI and WI are not disjoint as the same locations are used for reading and writing by SI.
The following are Bernstein Parallelism conditions which are used to determine whether statements are parallel or not:
Locations in R1 from which S1 reads and the locations W2 onto which S2 writes must be mutually exclusive: R1 ∩ W2 = φ
Locations in R2 from which S2 reads and the locations W1 onto which S1 writes must be mutually exclusive: R2 ∩ W1 = φ
The memory locations W1 and W2 onto which S1 and S2 write, should not overlap: W1 ∩ W2 = φ
To show the operation of Bernstein's conditions, consider the following instructions of a sequential program:
I1: x = (a + b) / (a * b) I2: y = (b + c) * d I3: z = x² + (a * e)
Now, the read set and write set of I1, I2 and I3 are as follows:
R1 = {a, b}
W1 = {x}
R2 = {b, c, d}
W2 = {y}
R3 = {x, a, e}
W3 = {z}
Now let us find out whether I1 and I2 are parallel or not:
That means I1 and I2 are independent of each other and can be executed in parallel.
Similarly for I1 || I3:
Hence I1 and I3 are not independent of each other and cannot be executed in parallel.
For I2 || I3:
Hence, I2 and I3 are independent of each other and can be executed in parallel.
Can run in parallel
Cannot run in parallel
Can run in parallel
Grain size or Granularity is a measure which determines how much computation is involved in a process. Grain size is determined by counting the number of instructions in a program segment.
Contains approximately less than 20 instructions
Contains approximately less than 500 instructions
Contains approximately greater than or equal to one thousand instructions
Fine Grain Example:
a = b + c
d = e + f
g = a + d
Medium Grain Example:
function calculateSum(arr) {
let sum = 0;
for (let i = 0; i < arr.length; i++) {
sum += arr[i];
}
return sum;
}
Coarse Grain Example:
function processImage(image) {
// Image processing algorithm
// Hundreds or thousands of instructions
return processedImage;
}
According to the grain sizes, parallelism in a program can be categorized into different levels. These parallelism levels make up a hierarchy where processes become finer-grained at lower levels. As the level increases, the degree of parallelism decreases.
The most basic level with the highest degree of parallelism. Fine grain size with just a few instructions per grain.
Involves parallelizing iterative loop instructions. Also fine grain size. Simple loops are easy to parallelize.
Consists of procedures, subroutines or subprograms. Medium grain size with thousands of instructions per procedure.
The highest level, consisting of independent programs. Coarse grain size with tens of thousands of instructions per program.
| Parallelism Level | Grain Size | Characteristics | Implementation |
|---|---|---|---|
| Instruction Level | Fine | Highest degree of parallelism, more overhead | Hardware (superscalar, VLIW) |
| Loop Level | Fine | Simple loops easy to parallelize, recursive loops difficult | Compilers can achieve automatically |
| Procedure Level | Medium | Programmers have exploited, compilers not yet | |
| Program Level | Coarse | Tens of thousands of instructions, independent programs | Time sharing, operating system |
| Grain Size | Parallelism Level | Typical System Type |
|---|---|---|
| Fine Grain | Instruction or Loop Level | SIMD organization |
| Medium Grain | Procedure or Subprogram Level | Loosely coupled systems |
| Coarse Grain | Program Level | Tightly coupled or shared memory multiprocessors (e.g., Cray Y-MP) |
Typically used for coarse grain parallelism. Examples include shared memory multiprocessors like the Cray Y-MP.
Utilized to execute medium grain program segments. Distributed memory systems fall into this category.
Fine grain parallelism has been seen in the SIMD (Single Instruction, Multiple Data) organization of computers.
Fine grain parallelism requires more communication and synchronization overhead
Coarse grain parallelism is generally more scalable to larger systems
Fine grain parallelism is often more complex to program effectively
Based on instruction and data streams (SISD, SIMD, MISD, MIMD)
Categorizes computers at three distinct tiers: PCU, ALU, and BLC
Tightly coupled (shared memory) and loosely coupled (distributed memory) systems
Based on the amount of computation in a process (fine, medium, coarse)
Data, control, and resource dependencies affect parallelism
Mathematical conditions to determine if instructions can run in parallel
Fine (<20 instructions), Medium (<500 instructions), Coarse (≥1000 instructions)
Instruction, Loop, Procedure, and Program levels with decreasing parallelism degree
Understanding grain size classification is essential for designing efficient parallel computing systems. The choice of grain size affects the system's performance, scalability, and programming complexity. Fine-grained parallelism offers high parallelism but with more overhead, while coarse-grained parallelism is easier to implement but may not utilize all available resources optimally.
By analyzing dependencies using Bernstein conditions and selecting the appropriate grain size for the target architecture, we can design parallel systems that effectively balance parallelism with overhead, leading to optimal performance for a wide range of applications.